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Abstract 

The Bonami-Beckner hypercontractive inequality is a powerful tool in Fourier analysis of real-valued 
functions on the Boolean cube. In this paper we present a version of this inequality for matrix-valued 
functions on the Boolean cube. Its proof is based on a powerful inequality by Ball, Carlen, and Lieb. We 
also present a number of applications. First, we analyze maps that encode n classical bits into m qubits, 
in such a way that each set of k bits can be recovered with some probability by an appropriate measure- 
ment on the quantum encoding; we show that if to < 0.7n, then the success probability is exponentially 
small in k. This result may be viewed as a direct product version of Nayak's quantum random access 
code bound. It in turn implies strong direct product theorems for the one-way quantum communication 
complexity of Disjointness and other problems. Second, we prove that error-correcting codes that are 
locally decodable with 2 queries require length exponential in the length of the encoded string. This 
gives what is arguably the first "non-quantum" proof of a result originally derived by Kerenidis and de 
Wolf using quantum information theory, and answers a question by Trevisan. 
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1 Introduction 



1.1 A hypercontractive inequality for matrix- valued functions 

Fourier analysis of real-valued functions on the Boolean cube has been widely used in the theory of comput- 
ing. Applications include analyzing the influence of variables on Boolean functions [301, probabilistically- 
checkable proofs and associated hardness of approximation 1231 . analysis of threshold phenomena ||3T1 . 
noise stability H3ll48l . voting schemes ll50l . learning under the uniform distribution flni42ll27ll44i . com- 
munication complexity Il5n[34l !l81. etc. 

One of the main technical tools in this area is a hypercontractive inequality that is sometimes called the 
Bonami-Beckner inequality |[T0l l6l. though its history would also justify other names (see Lecture 16 of B9l 
for some background and history). For a fixed p S [0, 1], consider the linear operator Tp on the space of all 
functions / : {0, 1}" M defined by 

(r,(/))(x)=E,[/(y)], 

where the expectation is taken over y obtained from x by negating each bit independently with probability 
{1 — p)/2. In other words, the value of Tp(/) at a point x is obtained by averaging the values of / over 
a certain neighborhood of x. One important property of Tp for p < 1 is that it has a "smoothing" effect: 
any "high peaks" present in / are smoothed out in Tp{f ). The hypercontractive inequality formalizes this 
intuition. To state it precisely, define the p-norm of a function / by ||/||p = It is not 

difficult to prove that the norm is nondecreasing with p. Also, the higher p is, the more sensitive the norm 
becomes to peaks in the function /. The hypercontractive inequality says that for certain q > p, the (/-norm 
of Tp{f) is upper bounded by the p-norm of /. This exactly captures the intuition that Tp{f) is a smoothed 
version of /: even though we are considering a higher norm, the norm does not increase. More precisely, 
the hypercontractive inequality says that as long as I < p < q and p < Y^(p — l)/(g — 1), we have 

rp(/)ii,<ii/iip. (1) 

The most interesting case for us is when q = 2, since in this case one can view the inequality as a 
statement about the Fourier coefficients of /, as we describe next. Let us first recall some basic definitions 
from Fourier analysis. For every S C [n] (which by some abuse of notation we will also view as an n-bit 
string) and x G {0, 1}", define xsi^) = (—1)^ '^ to be the parity of the bits of x indexed by S. The Fourier 
transform of a function / : {0, 1}"^ ^ M is the function / : {0, 1}" M defined by 

x€{0,l}" 

The values f{S) are called the Fourier coefficients of /. The coefficient f{S) may be viewed as measuring 
the correlation between / and the parity function xs- Since the functions xs form an orthonormal basis of 
the space of all functions from {0, 1}" to M, we can express / in terms of its Fourier coefficients as 

/ = E (2) 

SC[n] 

Using the same reasoning we obtain Parseval's identity. 
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The operator Tp has a particularly elegant description in terms of the Fourier coefficients. Namely, it simply 
multiplies each Fourier coefficient f{S) by a factor of pl'^l; 

SQn] 

The higher l^l is, the stronger the Fourier coefficient f{S) is "attenuated" by Tp. Using Parseval's identity, 
we can now write the hypercontractive inequality ^ for the case q = 2 as follows. For every p G [1> 2], 

\5C[n] / \ xe{0,l}" / 

This gives an upper bound on a weighted sum of the squared Fourier coefficients of /, where each coefficient 
is attenuated by a factor (p — l)'"^'. We are interested in generalizing this hypercontractive inequality to 
matrix-valued functions. Let J\A be the space of d x d complex matrices and suppose we have a function 
/ : {0, 1}" — > M. For example, a natural scenario where this arises is in quantum information theory, if 
we assign to every x G {0, 1}" some m-qubit density matrix f{x) (so d = 2™). We define the Fourier 
transform / of a matrix-valued function / exactly as before: 

/(^) = ^ E 

a;e{0,l}" 

The Fourier coefficients f{S) are now also d x d matrices. An equivalent definition is by applying the 
standard Fourier transform to each i, j-entry separately: f{S)ij = f{-)ij{S). This extension of the Fourier 
transform to matrix- valued functions is quite natural, and has also been used in, e.g., B6l[T7ll . 

Our main tool, which we prove in Section [3l is an extension of the hypercontractive inequahty to matrix- 
valued functions. For M G 7W with singular values ai, . . . , a^, we define its (normalized Schatten) p-norm 

as||M||^ = (iEtiO^/^- 

Theorem 1. For every f : {0, 1}" Mandl<p<2, 

\ 1/2 / N I/P 

'^fV(s)\\l] < E Wfi^X] ■ 

/ \ a;G{0,l}" / 

This is the analogue of Eq. © for matrix-valued functions, withp-norms replacing absolute values. The 
case n = 1 can be seen as a geometrical statement that extends the familiar parallelogram law in Euclidean 
geometry and is closely related to the notion of uniform convexity. This case was first proven for certain 
values of p by Tomczak-Jaegermann [54] and then in full generality by Ball, Carlen, and Lieb [4J. Among 
its applications are the work of Carlen and Lieb on fermion fields 1, 14] . and the more recent work of Lee and 
Naor on metric embeddings |[38l . 

To the best of our knowledge, the general case n > 1 has not appeared before^ Its proof is not difficult, 
and follows by induction on n, similar to the proof of the usual hypercontractive inequalityH Although 




'a different generalization of the Bonami-Beckner inequality was given by Borell fill. His generalization, however, is an easy 
corollary of the Bonami-Beckner inequality and is therefore relatively weak (although it does apply to any Banach space, and not 
just to the space of matrices with the Schatten p-norm). 

^We remark that Carlen and Lieb's proof in H4il also uses induction and has some superficial resemblance to the proof given 
here. Their induction, however, is on the dimension of the matrices (or more precisely, the number of fermions), and moreover 
leads to an entirely different inequality. 
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one might justly regard Theorem [T] as a "standard" corollary of the result by Ball, Carlen, and Lieb, such 
"tensorized inequalities" tend to be extremely useful (see, e.g., ©Ell) and we beheve that the matrix-valued 
hypercontractive inequality will have more applications in the future. 

1.2 Application: k-out-oi-n random access codes 

Our main application of Theorem [T] is for the following information-theoretic problem. Suppose we want 
to encode an n-bit string x into m bits or qubits, in such a way that for any set S C [n] of k indices, the 
fc-bit substring xs can be recovered with probability at least p by making an appropriate measurement on 
the encoding. We are allowed to use probabilistic encodings here, so the encoding need not be a function 
mapping x to a fixed classical string or a fixed quantum pure state. We will call such encodings k-out-of-n 
random access codes, since they allow us to access any set of k out of n bits. As far as we know, for > 1 
neither the classical nor the quantum case has been studied before. Here we focus on the quantum case, 
because our lower bounds for quantum encodings of course also apply to classical encodings. 

We are interested in the tradeoff between the length m of the quantum random access code, and the 
success probability p. Clearly, if m > n then we can just use the identity encoding to obtain p = 1. 
If m < n then by Holevo's theorem 1251 our encoding will be "lossy", and p will be less than 1. The 
case k = I was first studied by Ambainis et al. |2|, who showed that if p is bounded away from 1/2, then 
m = Q.{n/ logn). Nayak [45 1 subsequently strengthened this bound to m > {l — H{p))n, where H{-) is the 
binary entropy function. This bound is optimal up to an additive log n term both for classical and quantum 
encodings. The intuition of Nayak's proof is that, for average i, the encoding only contains m/n < I bits 
of information about the bit Xj, which limits our ability to predict Xi given the encoding. 

Now suppose that k > I, and m is much smaller than n. Clearly, for predicting one specific bit Xj, 
with i uniformly chosen, Nayak's result applies, and we will have a success probability that is bounded 
away from 1. But intuitively this should apply to each of the k bits that we need to predict. Moreover, 
these k success probabilities should not be very correlated, so we expect an overall success probability that 
is exponentially small in k. Nayak's proof does not generalize to the case A; » 1 (or at least, we do not 
know how to do it). The reason it fails is the following. Suppose we probabilistically encode x G {0, 1}" 
as follows: with probability 1/4 our encoding is x itself, and with probability 3/4 our encoding is the empty 
string. Then the average length of the output (and hence the entropy or amount of information in the 
encoding) is only n/4 bits, or 1/4 bit for an average Xj. Yet from this encoding one can predict all of x 
with success probability 1/4! Hence, if we want to prove our intuition, we should make use of the fact that 
the encoding is always confined to a 2™^ -dimensional space (a property which the above example lacks). 
Arguments based on von Neumann entropy, such as the one of |45|, do not seem capable of capturing this 
condition (however, a min-entropy argument recently enabled Konig and Renner to prove a closely related 
but incomparable result, see below). The new hypercontractive inequality offers an alternative approach — in 
fact the only alternative approach to entropy-based methods that we are aware of in quantum information. 
Applying the inequality to the matrix-valued function that gives the encoding implies p < 2~^('^) if m <C n. 
More precisely: 

Theorem 2. For any rj > 2 In 2 there exists a constant Crj such that ifn/k is large enough then for any 
k-out-of-n quantum random access code on m qubits, the success probability satisfies 
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In particular, the success probability is exponentially small in k if m/n < 1/(2 In 2) w 0.721. Notice 
that for very small m/n the bound on p gets close to 2"^^, which is what one gets by guessing the A;-bit 
answer randomly. We also obtain bounds if k is close to n, but these are a bit harder to state. We believe 
that the theorem can be extended to the case that m/n > 1/(2 In 2), although proving this would probably 
require a strengthening of the inequality by Ball, Carlen, and Lieb. Luckily, in all our applications we are 
free to choose a small enough m. Finally, we note that in contrast to Nayak's approach, our proof does not 
use the strong subadditivity of von Neumann entropy. 

The classical case. We now give a few comments regarding the special case of classical (probabilistic) 
m-bit encodings. First, in this case the encodings are represented by diagonal matrices. For such matrices, 
the base case n = 1 of Theorem [T] can be derived directly from the Bonami-Beckner inequality, without 
requiring the full strength of the Ball-Carlen-Lieb inequality (see [4] for details). Alternatively, one can 
derive Theorem [2] in the classical case directly from the Bonami-Beckner inequality by conditioning on a 
fixed m-bit string of the encoding (this step is already impossible in the quantum case) and then analyzing 
the resulting distribution on {0, 1}". This proof is very similar to the one we give in Section |4] (and in fact 
slightly less elegant due to the conditioning step) and we therefore omit the details. 

Interestingly, in the classical case there is a simpler argument that avoids Bonami-Beckner altogether. 
This argument was used in 1561 and was communicated to us by the authors of that paper. We briefly sketch 
it here. Suppose we have a classical (possibly randomized) m-bit encoding that allows to recover any A; -bit 
set with probability at least p using a (possibly randomized) decoder. By Yao's minimax principle, there 
is a way to fix the randomness in both the encoding and decoding procedures, such that the probability of 
succeeding in recovering all k bits of a randomly chosen fc-set from an encoding of a uniformly random 
X G {0, 1}*^ is at least p. So now we have deterministic encoding and decoding, but there is still randomness 
in the input x. Call an x "good" if the probability of the decoding procedure being successful on a random 
/c-tuple is at least p/2 (given the m-bit encoding of that x). By Markov's inequality, at least a p/2-fraction 
of the inputs x are good. Now consider the following experiment. Given the encoding of a uniform x, 
we take £ = WOn/k uniformly and independently chosen /c-sets and apply the decoding procedure to all 
of them. We then output an n-bit string with the "union" of all the answers we received (if we received 
multiple contradictory answers for the same bit, we can put either answer there), and random bits for the 
positions that are not in the union. With probability p/2, x is good. Conditioned on this, with probability 
at least (p/2)^ all our decodings are correct. Moreover, except with probability 2~^("\ the union of our i 
fc-sets is of size at least 0.9n. The probability of guessing the remaining n/10 bits right is 2^"/^'^. Therefore 
the probability of successfully recovering all of X is at least {p/2) ■ ((p/2)^ - 2-^(")) • 2-"/^°. A simple 
counting argument shows that this is impossible unless p < 2^^(^) or m is close to n. This argument does 
not work for quantum encodings, of course, because these cannot just be reused (a quantum measurement 
changes the state). 

The Konig-Renner result. Independently but subsequent to our work (which first appeared on the arxiv 
preprint server in May 2007), Konig and Renner |[36l recently used sophisticated quantum information 
theoretic arguments to show a result with a similar flavor to ours. Each of the results is tuned for different 
scenarios. In particular, the results are incomparable, and our applications to direct product theorems do not 
follow from their result, nor do their applications follow from our result. We briefly describe their result and 
explain the distinction between the two. 

Let X = Xi, . . . , Xn be classical random variables, not necessarily uniformly distributed or even in- 
dependent. Suppose that each Xi G {0, 1}^. Suppose further that the "smooth min-entropy of X rela- 
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tive to a quantum state p" is at least some number h (see ||36]| for the precise definitions, which are quite 
technical). If we randomly pick r distinct indices ii, . . . , v, then intuitively the smooth min-entropy of 
X' = , . . . , Xi^. relative to p should not be much smaller than hr/n. Konig and Renner show that if h is 
larger than n/r then this is indeed the case, except with probability exponentially small in r. Note that they 
are picking 6-bit blocks Xi^ , ■ ■ ■ , Xi^ instead of individual bits, but this can also be viewed as picking (not 
quite uniformly) k = rb bits from a string of nb bits. 

On the one hand, the constants in their bounds are essentially optimal, while ours are a factor 2 In 2 
off from what we expect they should be. Also, while they need very few assumptions on the random vari- 
ables Xi, . . . , Xn and on the quantum encoding, we assume the random variables are uniformly distributed 
bits, and our quantum encoding is confined to a 2"^ -dimensional space. We can in fact slightly relax both 
the assumption on the input and the encoding, but do not discuss these relaxations since they are of less 
interest to us. Finally, their result still works if the indices ii, . . . ,ir are not sampled uniformly, but are 
sampled in some randomness-efficient way. This allows them to obtain efficient key-agreement schemes in 
a cryptographic model where the adversary can only store a bounded number of quantum bits. 

On the other hand, our result works even if only a small number of bits is sampled, while theirs only 
kicks in when the number of bits being sampled (k = rb) is at least the square-root of the total number of 
bits nb. This is not very explicit in their paper, but can be seen by observing that the parameter k = n/{rb) 
on page 8 and in Corollary 6.19 needs to be at most a constant (whence the assumption that b is larger than 
n/r). So the total number of bits is nb = 0{rb^) = 0{r'^b'^) = 0{k^). Since we are interested in small as 
well as large k, this limitation of their approach is significant. A final distinction between the results is in 
the length of the proof. While the information-theoretic intuition in their paper is clear and well-explained, 
the details get to be quite technical, resulting in a proof which is significantly longer than ours. 

1.3 Application: Direct product theorem for one-way quantum communication complexity 

Our result for A;-out-of-n random access codes has the flavor of a direct product theorem: the success 
probability of performing a certain task on k instances (i.e., k distinct indices) goes down exponentially 
with k. In Section [51 we use this to prove a new strong direct product theorem for one-way communication 
complexity. 

Consider the 2-party Disjointness function: Alice receives input x e {0, 1}", Bob receives input 
y G {0, 1}", and they want to determine whether the sets represented by their inputs are disjoint, i.e. whether 
XiVi = for all z G [n]. They want to do this while communicating as few qubits as possible (allowing some 
small error probability, say 1/3). We can either consider one-way protocols, where Alice sends one message 
to Bob who then computes the output; or two-way protocols, which are interactive. The quantum commu- 
nication complexity of Disjointness is fairly well understood: it is B(n) qubits for one-way protocols [il3il . 
and &{^/n) qubits for two-way protocols |[T2ll26l [ni52l. 

Now consider the case of k independent instances: Alice receives inputs xi, . . . , (each of n bits). 
Bob receives yi, . . . , y^, and their goal is to compute all k bits DISJ„(xi, yi), . . . , DISJ„(xfc, y^). Klauck 
et al. [35] proved an optimal direct product theorem for two-way quantum communication: every protocol 
that communicates fewer than ak^fn qubits (for some small constant a > 0) will have a success probability 
that is exponentially small in k. Surprisingly, prior to our work no strong direct product theorem was known 
for the usually simpler case of one-way communication — not even for classical one-way communication!! 
In Section [5] we derive such a theorem from our fc-out-of-n random access code lower bound: if > 2 In 2, 

^Recently and independently of our work, Jain et al. (28| did manage to prove such a direct product theorem for classical 
one-way communication, based on information-theoretic techniques. 
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then every one-way quantum protocol that sends fewer than kn/r] qubits will have success probability at 
most 2-^('=). 

These results can straightforwardly be generalized to get a bound for all functions in terms of their VC- 
dimension. If / has VC-dimension d, then any one-way quantum protocol for computing k independent 
copies of / that sends kd/rj qubits, has success probability 2^^^'^). For simplicity, Section [5] only presents 
the case of Disjointness. Finally, by the work of Beame et al. Q, such direct product theorems imply 
lower bounds on 3-party protocols where the first party sends only one message. We elaborate on this in 
Appendix lAl 

1.4 Application: Locally decodable codes 

A locally decodable error-correcting code (LDC) C : {0, 1}" {0, 1}^ encodes n bits into bits, 
in such a way that each encoded bit can be recovered from a noisy codeword by a randomized decoder 
that queries only a small number q of bit-positions in that codeword. Such codes have applications in a 
variety of different complexity-theoretic and cryptographic settings; see for instance Trevisan's survey and 
the references therein ll55l . The main theoretical issue in LDCs is the tradeoff between q and N. The 
best known constructions of LDCs with constant q have a length N that is sub-exponential in n but still 
superpolynomial fT6l 1711591. On the other hand, the only superpolynomial lower bound known for general 
LDCs is the tight bound A'^ = 2^*^"^ for q = 2 due to Kerenidis and de Wolf |33| (generalizing an earlier 
exponential lower bound for linear codes by |[T9l ). Rather surprisingly, the proof of ll33l relied heavily 
on techniques from quantum information theory: despite being a result purely about classical codes and 
classical decoders, the quantum perspective was crucial for their proof. In particular, they show that the two 
queries of a classical decoder can be replaced by one quantum query, then they turn this quantum query into 
a random access code for the encoded string x, and finally invoke Nayak's lower bound for quantum random 
access codes. 

In Section |6] we reprove an exponential lower bound on A^ for the case q = 2 without invoking any 
quantum information theory: we just use classical reductions, matrix analysis, and the hypercontractive 
inequality for matrix-valued functions. Hence it is a classical (non-quantum) proof as asked for by Tre- 
visan |55 , Open question 3 in Section 3.6] It should be noted that this new proof is still quite close in spirit 
(though not terminology) to the quantum proof of fST]. This is not too surprising given the fact that the 
proof of [33 ] uses Nayak's lower bound on random access codes, generalizations of which follow from the 
hypercontractive inequality. We discuss the similarities and differences between the two proofs in Section|6] 

We feel the merit of this new approach is not so much in giving a partly new proof of the known 
lower bound on 2-query LDCs, but in its potential application to codes with more than 2 queries. Recently 
Yekhanin [i59il constructed 3-query LDCs with A^ = 20in'^ (and A^ = 2«°<'^'°^'°^"' for infinitely 
many n if there exist infinitely many Mersenne primes). For g = 3, the best known lower bounds on A^ are 
slightly less than v? ll32l l33l |58]| . Despite considerable effort, this gap still looms large. Our hope is that 
our approach can be generalized to 3 or more queries. Specifically, what we would need is a generalization 
of tensors of rank 2 (i.e., matrices) to tensors of rank q; an appropriate tensor norm; and a generalization of 
the hypercontractive inequality from matrix-valued to tensor-valued functions. Some preliminary progress 
towards this goal was obtained in |[24l . 

''Alex Samorodnitsky has been developing a classical proof along similar lines in the past two years. However, as he told us at 
the time of writing [53 1, his proof is still incomplete. 
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2 Preliminaries 



Norms: Recall that we define the p-norm of a d-dimensional vector v by 

i/p 




We extend this to matrices by defining the (normalized Schatten) p-norm of a matrix ^4 G C^^*^ as 

i/p 



This is equivalent to the p-norm of the vector of singular values of A. For diagonal matrices this definition 
coincides with the one for vectors. For convenience we defined all norms to be under the normalized 
counting measure, even though for matrices this is nonstandard. The advantage of the normalized norm is 
that it is nondecreasing with p. We also define the trace norm ||74||^j, of a matrix A as the sum of its singular 
values, hence we have \\A\\^^ = dll^H^ for any d x d matrix A. 



Quantum states: An m-qubit pure state is a superposition |0) = X^^gjo i}"' '^z\z) over all classical m-bit 
states. The a^'s are complex numbers called amplitudes, and la^P = 1- Hence a pure state is a 
unit vector in C^"\ Its complex conjugate (a row vector with entries conjugated) is denoted ((^|. The inner 
product between |0) = az\z) and |^) = Pz\z) is the dot product ((^| • = {(t)\'4>) = (^tPz- An 
m-qubit mixed state (or density matrix) p = 'YiPi\(t^i) {(t^i \ corresponds to a probability distribution over m- 
qubit pure states, where is given with probability pi. The eigenvalues Ai, . . . , of p are non-negative 
reals that sum to 1, so they form a probability distribution. If p is pure then one eigenvalue is 1 while all 
others are 0. Hence for any p>l, the maximal p-norm is achieved by pure states: 

d d 

«=1 i=l 

A fc-outcome positive operator-valued measurement (POVM) is given by k positive semidefinite oper- 
ators El, . . . ,Ej^ with the property that J2i=i — ^- When this POVM is applied to a mixed state p, the 
probability of the ith outcome is given by the trace Tr{Eip). The following well known fact gives the close 
relationship between trace distance and distinguishability of density matrices: 

Fact 3. The best possible measurement to distinguish two density matrices po and pi has bias ^ ||/Oo ~ Pi lltr- 

Here "bias" is defined as twice the success probability, minus 1. We refer to Nielsen and Chuang BTl 
for more details. 



3 The hypercontractive inequality for matrix-valued functions 

Here we prove Theorem[T] The proof relies on the following powerful inequality by Ball et al. [4] (they state 
this inequality for the usual unnormahzed Schatten p-norm, but both statements are clearly equivalent). 
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Lemma 4. Theorem 1 ]) For any matrices A, B and any 1 < p < 2, it holds that 



A + B 



A-B 



1/2 



< 



i^ii^+iii^iir 



i/p 



TheoremlH For any f : {0, 1}" M. and for any 1 <p <2, 

1/2 , 

^ 2 



f E(i'-i)'"ii/(s)id 

\5CM / 



1 E 

a;G{0,l}" / 



1/p 



Proof: By induction. The case n = 1 follows from Lemma|4]by setting A = f{0) and B = /(I), and 
noting that {A + B) /2 and {A — B) /2 are exactly the Fourier coefficients /(O) and /(I). 

We now assume the lemma holds for n and prove it for n + 1. Let / : {0, 1}"+^ ^ M. he. some 
matrix-valued function. For i G {0, 1}, let gi = be the function obtained by fixing the last input 

bit of / to i. We apply the induction hypothesis on qq and gi to obtain 



Y.iv-mwo{s)\\i\ <U E 

?C[ri] / \ 2:e{0,l}" 

{Y.iv-m\9.{s)\\i\ <U E 



bo(x)||^ 



bi(x)||? 



1/p 



1/p 



Take the Lp average of these two inequalities: raise each to the pth power, average them and take the pth 
root. We get 



\Y. \Y.^v-^r\us)fA < E (ik(x)ii^+ii5i(x)ii^; 

ie{0,l} \5C[n] j j \ xe{0,l}" 



1/p 



(5) 



1/p 



2^+1 



a:G{0,l}"+^ 

The right-hand side is the expression we wish to lower bound. To bound the left-hand side, we need the 
following inequality (to get a sense of why this holds, consider the case where gi = 1 and q2 = oo). 

Lemma 5 (Minkowski's inequality, [22, Theorem 26]). For any ri x r2 matrix whose rows are given by 
til, ... , and whose columns are given by vi, . . . , Vr2, and any 1 < qi < q2 ^ oo, 



Pi I 



92' 



> 



91 



ml 



91' 



92 



i.e., the value obtained by taking the q2-norm of each column and then taking the qi-norm of the results, is 
at least that obtained by first taking the qi-norm of each row and then taking the q2-norm of the results. 
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Consider now the 2" x 2 matrix whose entries are given by 



cs,', 



2n/2 



where i G {0, 1} and S CI [n]. The left-hand side of ^ is then 



9 I 2"- 

iG{0,l} \ 5C[n] 



p/2N 



1/p 



1/2 



^2 



> 



/ / \ Vp' 

2" ^ ^ 1 2 ^ ^ 

y SC[n] \ i6{0,l} 



ySC[n] V ^ 



2/A 



where the inequality follows from Lemma |5] with qi = p, q2 = 2. We now apply Lemma |4] to deduce that 
the above is lower bounded by 



\s\ 



go{S)+gi{S) 



9o{S) -gi{S) 



1/2 



1/2 



^5C[n+l] 



Where we used f{S) = UooiS) + gi{S)) and /(5 U {n + 1}) = Ugo{S) - gi{S)) for any S C [n]. 



4 Bounds for A;-out-of-n quantum random access codes 

In this section we prove Theorem [2l Recall that a A;-out-of-n random access code allows us to encode n 
bits into m qubits, such that we can recover any A;-bit substring with probability at least p. We now define 
this notion formally. In fact, we consider a somewhat weaker notion where we only measure the success 
probability for a random k subset, and a random input x G {0, 1}". Since we only prove impossibility 
results, this clearly makes our results stronger. 

Definition 1. A k-out-of-n quantum random access code on m qubits with success probability p (for short 
{k, n, m, p)-QRAC), is a map 

/:{0,ir ^c^'-x^™ 

that assigns an m-qubit density matrix f{x) to every x £ {0, 1}", and a quantum measurement {Ms^z}ze{o,i}'' 
to every set S S {^^)> with the property that 

E,^s[MMs,xs- fix))]>P, 

where the expectation is taken over a uniform choice of x G {0, 1}" and S G ('^^). <^nd xs denotes the k-bit 
substring ofx specified by S. 

In order to prove Theorem|2j we introduce another notion of QRAC, which we call XOR-QRAC. Here, 
the goal is to predict the XOR of the k bits indexed by S (as opposed to guessing all the bits in S). Since one 
can always predict a bit with probability i, it is convenient to define the bias of the prediction as e = 2p — 1 
where p is the probability of a correct prediction. Hence a bias of 1 means that the prediction is always 
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correct, whereas a bias of — 1 means that it is always wrong. The advantage of dealing with an XOR-QRAC 
is that it is easy to express the best achievable prediction bias without any need to introduce measurements. 
Namely, if / : {0, 1}" ([;;2"x2™ ^j^g encoding function, then the best achievable bias in predicting the 
XOR of the bits in S (over a random {0, 1}") is exactly half the trace distance between the average of /(x) 
over all x with the XOR of the bits in S being and the average of /(x) over all x with the XOR of the bits 
in S being 1. Using our notation for Fourier coefficients, this can be written simply as ||/('S')||j^j.. 

Definition 2. A k-out-of-n XOR quantum random access code on m qubits with bias e {for short {k,n,m,e)- 
XOR-QRAC), is a map 

/:{0,ir ^C^'-x^™ 
that assigns an m-qubit density matrix f{x) to every x G {0, 1}" and has the property that 



\f{S)\\ 



> e. 



Our new hypercontractive inequality allows us to easily derive the following key lemma: 

Lemma 6. Let f : {0, 1}" —>■ C^^^^™ be any mapping from n-bit strings to m-qubit density matrices. 
Then for any < 6 < 1, we have 

SC[n] 

Proof: Let p = 1 -\- 6. On one hand, by Theorem[T]and Eq. ^ we have 

SC[n] ^ a;e{0,l}" ^ ^ ^ 

On the other hand, by norm monotonicity we have 



san] san] san] 



By rearranging we have 



as required. 



^ {p-lf\\\f(S)\\l < 22™a-i/p) < 22™{p-i)^ 

SC[n] 



The following is our main theorem regarding XOR-QRAC. In particular it shows that if k = o{n) and 
m/n < 1/(2 In 2) « 0.721, then the bias will be exponentially small in k . 

Theorem 7. For any (k, n, m, e)-XOR-QRAC we have the following bound on the bias 

,'(2eln2)m\''/^ /n^ 
e < ' ' ' 



k J [k 

In particular, for any ij > 2 In 2 there exists a constant Cr/ such that ifn/k is large enough then for any 
{k, n, m, e)-XOR-QRAC, 
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Proof: Apply Lemma |6] with 6 



(2 In 2)ni 



and only take the sum on S with |5| = k. This gives 



E 



\m\\ 



tr 



(2eln2)my /n 



The first bound on e now follows by convexity (Jensen's inequality). To derive the second bound, approxi- 
mate (?) using Stirling's approximation n! = 0(y^(n/e)"): 



k\(n-k)\ 



e 



n 



k{n — k) \kJ 



1 + 



k 



n — k 



n—k'^ 



Now use the fact that for large enough n/k we have ( 1 + /c/ (n — A;) ) ^^/'^ > (2e In 2) /r/, and notice that 

^l/k can be absorbed by this approximation. ■ 



the factor ^njkin — k) > 

We now derive Theorem [2] from Theorem |7] 



Proof of Theorem m Consider a (k,n,m,p)-QRAC, given by encoding function / and measurements 
{Mt,z} z£{o,i}'' for ^ ^ ('fc )• Define priw) = [Pr[z ® xt = w\] as the distribution on the "er- 
ror vector" w G {0, l}'^ of the measurement outcome z G {0, l}'^ when applying {Mt.z}- By definition, 
we have that p < Et[pt(0'')]. 

Now suppose we want to predict the parity of the bits of some set S of size at most k. We can do this as 
follows: uniformly pick a set T G ('^') that contains S, measure /(x) with {Mt^z}, and output the parity 
of the bits corresponding to S in the measurement outcome z. Note that our output is correct if and only if 
the bits corresponding to S in the error vector w have even parity. Hence the bias of our output is 



X] Pt{w)xs{w) 



ti;e{0,l}* 



2'' E- 



T:TZ)S 



(We slightly abuse notation here by viewing S both as a subset of T and as a subset of \k] obtained by 
identifying T with [k].) Notice that f3s can be upper bounded by the best-achievable bias ||/(5')||^j.. 

Consider the distribution S on sets S defined as follows: first pick j from the binomial distribution 
B{k, 1/2) and then uniformly pick S G (^"'). Notice that the distribution on pairs (5, T) obtained by first 

choosing 5 ~ 5 and then choosing a uniform T ^ S from ('^^) is identical to the one obtained by first 
choosing uniformly T from ('^^) and then choosing a uniform 5 C T. This allows us to show that the 
average bias jSs over 5 ~ 5 is at least p, as follows: 



Es^s IPs 



2''E, 



'r~([^'),5cr 



E, 



.SCT 



E, 



n]) [pr(0'')] >P, 
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where the last equality follows from Eq. Q. On the other hand, using Theorem|7]we obtain 

<1E5^5 [\\f{S)\\,^ 




where the last equaUty uses the binomial theorem. Combining the two inequalities completes the proof. ■ 



5 Direct product theorem for one-way quantum communication 

The setting of communication complexity is by now well-known, so we will not give formal definitions of 
protocols etc., referring to |[37l 1571 instead. Consider the n-bit Disjointness problem in 2-party commu- 
nication complexity. Alice receives n-bit string x and Bob receives n-bit string y. They interpret these 
strings as subsets of [n] and want to decide whether their sets are disjoint. In other words, DISJ„(x, y) = 1 
if and only if x n y = 0. Let DISJ^'^^ denote k independent instances of this problem. That is, Alice's 
input is a A;-tuple xi, . . . , of n-bit strings, Bob's input is a A;-tuple yi, . . . , y^, and they should output 

all k bits: DlSJn\xi, . . . ,Xk,yi, . . . ,yk) = DISJn(a;i, yi), . . . , DISJ„(xfc, y^). The trivial protocol where 
Ahce sends all her inputs to Bob has success probability 1 and communication complexity kn. We want to 
show that if the total one-way communication is much smaller than kn qubits, then the success probability 
is exponentially small in k. We will do that by deriving a random access code from the protocol's message. 

(k) 

Lemma 8. Let i < k. If there is a c-qubit one-way communication protocol for DISJ„ with success 
probability a, then there is an i-out-of-kn quantum random access code ofc qubits with success probability 
P>a{l-£/kY. 

Proof: Consider the following one-way communication setting: Alice has a fcn-bit string x, and Bob has 
£ distinct indices ii, . . . G [kn] chosen uniformly from (^^"^) and wants to learn the corresponding bits 
of X. 

(k) 

In order to do this, Alice sends the c-qubit message corresponding to input x in the DISJn protocol. 
We view x as consisting of k disjoint blocks of n bits each. The probability (over the choice of Bob's input) 
that ii, . . . ,ii G [kn] are in i different blocks is 

^■y-r kn — in ^ / kn — £n\^ / 

kn — i ~ \ kn J \ k J 

If this is the case. Bob chooses his Disjointness inputs yi, . . . , y^ as follows. If index ij is somewhere in 
block 6 G [k], then he chooses yt to be the string having a 1 at the position where ij is, and Os elsewhere. 
Note that the correct output for the 6-th instance of Disjointness with inputs x and yi , . . . , y^ is exactly 
1 — Xi- . Now Bob completes the protocol and gets a A;-bit output for the A;-fold Disjointness problem. A 
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correct output tells him the t bits he wants to know (he can just disregard the outcomes of the other k — d. 
instances). Overall the success probability is at least a{\ — Ijk)^ . Therefore, the random access code that 
encodes x by Alice's message proves the lemma. ■ 



Combining the previous lemma with our earlier upper bound on p for ^-out-of-fen quantum random 
access codes (Theorem O, we obtain the following upper bound on the success probability a of c-qubit 

{k\ 

one-way communication protocols for DISJn . For every r/ > 2 In 2 there exists a constant such that: 

Choosing £ a sufficiently small constant fraction of k (depending on r/), we obtain a strong direct product 
theorem for one-way communication: 

Theorem 9. For any r] > 2 In 2 the following holds: for any large enough n and any k, every one-way 
quantum protocol for DlSJ^f^ that communicates c < fcn/ry qubits, has success probability a < 2^^''^^ 
(where the constant in the ^[■) depends on rj). 

The above strong direct product theorem (SDPT) bounds the success probability for protocols that are 
required to compute all k instances correctly. We call this a zero-error SDPT. What if we settle for a weaker 
notion of "success", namely getting a (1 — e)-fraction of the k instances right, for some small e > 0? 
An e-error SDPT is a theorem to the effect that even in this case the success probability is exponentially 
small. An e-error SDPT follows from a zero-error SDPT as follows. Run an e-error protocol with success 
probability p ("success" now means getting 1 — e of the k instances right), guess up to ek positions and 
change them. With probability at least p, the number of errors of the e-error protocol is at most ek, and with 
probability at least 1/ Yl!i=Q (j "^^^ have corrected all those errors. Since ^So ( •) < 2'=^(^) (see, 
e-g-> ll29l Corollary 23.6]), we have a protocol that computes all instances correctly with success probability 
a > p2~^^^'^\ If we have a zero-error SDPT that bounds a < 2^"^'^ for some 7 > H{e), then it follows 
thatp must be exponentially small as well: p < 2~^('^~^(^))'^. Hence Theorem |9] implies: 

Theorem 10. For any ij > 2 In 2 there exists an e > such that the following holds: for every one-way 

(k) 

quantum protocol for DISJn that communicates c < kn/rj qubits, its probability to compute at least a 
(1 — e)-fraction of the k instances correctly is at most 2~^^^\ 

6 Lower bounds on locally decodable codes 

When analyzing locally decodable codes, it will be convenient to view bits as elements of {±1} instead of 
{0, 1}. Formally, a locally decodable code is defined as follows. 

Definition 3. C : {±1}" {±1}^ is a {q, 6, e) -locally decodable code (LDC) if there is a randomized 
decoding algorithm A such that 

1. For all X G {±1}", i € [n], and y S {±1}''^ with Hamming distance d{C{x),y) < 6N, we have 
PT[Ay{i) = Xi] > 1/2 + e. Here Ay{i) is the random variable that is A's output given input i and 
oracle y. 

2. A makes at most q queries to y, non-adaptively. 
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In Appendix |B] we show that such a code imphes the following: For each f G [n], there is a set Mj of at 
least deN/q^ disjoint tuples, each of at most q elements from [N\, and a sign q G {il} for each Q G Mi, 
such that 

^x[ai,QXi JJ C(x)j] > — , 

where the expectation is uniformly over all x G {±1}". In other words, the parity of each of the tuples in 
Mi allows us to predict Xi with non-trivial bias (averaged over all x). 

Kerenidis and de Wolf f33'| used quantum information theory to show the lower bound N = 2^^^^ ") on 
the length of 2-query LDCs. Using the new hypercontractive inequality, we can prove a similar lower bound. 
Our dependence on e and 6 is slightly worse, but can probably be improved by a more careful analysis. 

Theorem 11. IfC : {±1}" ^ {±1}^ is a {2,6,e)-LDC, then N = 2^('5'^*"). 

Proof: Define f{x) as the N x N matrix whose (z, j)-entry is C{x)iC{x)j. Since f{x) has rank 1 and its 
N"^ entries are all +1 or —1, its only non-zero singular value is N. Hence = N^^^ for every x. 

Consider the N x N matrices f{{i}) that are the Fourier transform of / at the singleton sets {i}: 

xe{±i}" 

We want to lower bound ||/({^})||p- 

With the above notation, each set Mj consists of at least 6eN/4 disjoint pairs of indicesjf] For simplicity 
assume Mj = {(1, 2), (3, 4), (5, 6), . . .}. The 2x2 submatrix in the upper left comer of /(x) is 

/ 1 C{x)iC{x)2 \ 

\ C{x)iC{x)2 1 J ■ 

Since (1, 2) G Mi, we have Ex[C {x)iC {x)2Xiai^(^i^2)] ^ 1]- Hence the 2 x 2 submatrix in the upper 
left comer of /({i}) is 




for some a with \a\ G [e/4, 1]. The same is true for each of the first 6eN/A 2x2 diagonal blocks of f{{i}) 
(each such 2x2 block corresponds to a pair in Mi). Let P be the N x N permutation matrix that swaps 
rows 1 and 2, swaps rows 3 and 4, etc. Then the first 5eN/2 diagonal entries of Fi = Pf{{i}) all have 
absolute value in [e/4, 1]. 

The norm is unitarily invariant: \\UAV\\p = \\A\\p for every matrix A and unitaries U, V . Note the 
following lemma, which is a special case of HI Eq. (IV.52) on p. 97]. We include its proof for completeness. 

Lemma 12. Let || • || be a unitarily-invariant norm on the set ofdxd complex matrices. If A is a matrix and 
diag(A) is the matrix obtained from A by setting its off-diagonal entries to 0, then ||diag(^)|| < \\A\\. 

Proof: We will step-by-step set the off-diagonal entries of A to 0, without increasing its norm. We start 
with the off-diagonal entries in the dth row and column. Let be the diagonal matrix that has D^ ^ = — 1 

^Actually some of the elements of Mi may be singletons. Dealing with this is a technicality that we will ignore here in order to 
simplify the presentation. 
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and Di^i = 1 for i < d. Note that D^ADd is the same as A, except that the off-diagonal entries of the dth 
row and column are multiplied by —1. Hence A' = {A + DdAD(i)/2 is the matrix obtained from A by 
setting those entries to (this doesn't affect the diagonal). Since Dd is unitary and every norm satisfies the 
triangle inequality, we have 

||A'|| = \\{A + DdADd)/2\\ < + 117^,^7^,11) = 

In the second step, we can set the off-diagonal entries in the {d — l)st row and column of A' to 0, using the 
diagonal matrix Dd^i which has a —1 only on its {d — l)st position. Continuing in this manner, we set all 
off-diagonal entries of A to zero without affecting its diagonal, and without increasing its norm. ■ 

Using this lemma, we obtain 

= > l|diag(F,)||, > i^-{6eN/2){e/4rj = {6e/2y/^e/A. 

Using the hypercontractive inequality (Theorem [D, we have for any p £ [1,2] 

n / \ 2/p 

n(p- l)(fe/2)2/^'(e/4)2 < ^(p- l)||/({,})||^ < ||/(x)||^ = iv2(p-i)/p. 

i=l \ X J 

Choosing p = 1 + 1/ log and rearranging implies the result. ■ 

Let us elaborate on the similarities and differences between this proof and the quantum proof of 133 ]. 
On the one hand, the present proof makes no use of quantum information theory. It only uses the well 
known version of LDCs mentioned after Definition |3l some basic matrix analysis, and our hypercontractive 
inequality for matrix-valued functions. On the other hand, the proof may still be viewed as a translation of 
the original quantum proof to a different language. The quantum proof defines, for each x, a log(A^)-qubit 
state \(l){x)) which is the uniform superposition over the indices of the codeword C(x). It then proceeds 
in two steps: (1) by viewing the elements of Mj as 2-dimensional projectors in a quantum measurement 
of \(j){x)), we can with good probability recover the parity C{x)jC{x)k for a random element {j,k) of 
the matching Mj. Since that parity has non-trivial correlation with Xj, the states |0(x)) form a quantum 
random access code: they allow us to recover each xi with decent probability (averaged over all x); (2) the 
quantum proof then invokes Nayak's linear lower bound on the number of qubits of a random access code 
to conclude log = Q.{n). The present proof mimics this quantum proof quite closely: the matrix f{x) 
is, up to normalization, the density matrix corresponding to the state \4){x)); the fact that matrix f{{i}) has 
fairly high norm corresponds to the fact that the parity produced by the quantum measurement has fairly 
good correlation with Xi, and finally, our invocation of Theorem [T] replaces (but is not identical to) the 
linear lower bound on quantum random access codes. We feel that by avoiding any explicit use of quantum 
information theory, the new proof holds some promise for potential extensions to codes with q>2>. 
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A 3-party NOF communication complexity of Disjointness 

Some of the most interesting open problems in communication complexity arise in the "number on the 
forehead" (NOF) model of multiparty communication complexity, with applications ranging from bounds 
on proof systems to circuit lower bounds. Here, there are £ players and I inputs xi, . . . , x^. The players want 
to compute some function f{xi, . . . , xi). Each player j sees all inputs except Xj. In the £-party version of 
the Disjointness problem, the i players want to figure out whether there is an index i S [n] where all £ input 
strings have a 1. For any constant i, the best known upper bound is linear in n ll20l . 

While the case £ = 2 has been well-understood for a long time, the first polynomial lower bounds for 
^ > 3 were shown only very recently. Lee and Shraibman [401, and independently Chattopadhyay and 
Ada [15], showed lower bounds of the form il(n^/(^+^)) on the classical communication complexity for 
constant £. This becomes r2(n^/'^) for £ = 3 players. 
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Stronger lower bounds can be shown if we limit the kind of interaction allowed between the players. 
Viola and Wigderson f56l showed a lower bound of J7(n^/*^^^^)) for the one-way complexity of ^-player 
Disjointness, for any constant i. In particular, this gives for ^ = sH An intermediate model was 

studied by Beame et al. [5], namely protocols where Charlie first sends a message to Bob, and then Alice and 
Bob are allowed two-way communication between each other to compute DISJ„(xi, X2, X3). This model 
is weaker than full interaction, but stronger than the one-way model. Beame et al. showed (using a direct 
product theorem) that any protocol of this form requires Q{n^/^) bits of communicationj^] 

Here we strengthen these two 3-player results to quantum communication complexity, while at the same 
time slightly simplifying the proofs. These results will follow easily from two direct product theorems: the 
one for two-way communication from 1351, and the new one for one-way communication that we prove here. 
Lee, Schechtman, and Shraibman f39\ have recently extended their ri(n^/(^+^)) classical lower bound to i- 
player quantum protocols. While that result holds for a stronger communication model than ours (arbitrary 
point-to-point quantum messages), their bound for ^ = 3 is weaker than ours 

A.l Communication- type C ^ {B ^ A) 

Consider 3-party Disjointness on inputs x,y, z € {0, 1}". Here Alice sees x and z, Bob sees y and z, and 
Charlie sees x and y. Their goal is to decide if there is an i G [n] such that Xi = y-i = Zi = 1. 

Suppose we have a 3-party protocol P for Disjointness with the following "flow" of communication. 
Charlie sends a message of ci classical bits to Alice and Bob (or just to Bob, it doesn't really matter), 
who then exchange C2 quhits and compute Disjointness with bounded error probability. Our lower bound 
approach is similar to the one of Beame et al. the main change being our use of stronger direct prod- 
uct theorems. Combining the (0-error) two-way quantum strong direct product theorem for Disjointness 
from [35] with the argument from the end of our Section |5l we have the following e-error strong direct 
product theorem for k instances of 2-party Disjointness: 

Theorem 13. There exist constants e > and a > such that the following holds: for every two-way 
quantum protocol for DISJn that communicates at most ak^fn quhits, its probability to compute at least 
an (1 — e)-fraction of the k instances correctly, is at most 2~^^^\ 

Assume without loss of generality that the error probability of our initial 3-party protocol P is at most 
half the e of Theorem [13] View the n-bit inputs of protocol P as consisting of t consecutive blocks of 
n/t bits each. We will restrict attention to inputs z = zi . . . zt where one Zi is all-1, and the other Zj are 
all-0. Note that for such a z, we have DISJ„(x, y, z) = DISJ„/((xj, y^). Fixing z thus reduces the 3-party 
Disjointness on (x, y, z) to 2-party Disjointness on a smaller instance (xj, yi). Since Charlie does not see 
input z, his ci-bit message is independent of z. Now by going over all t possible z's, and running their 
2-party protocol t times starting from Charlie's message, Alice and Bob obtain a protocol P' that computes 
t independent instances of 2-party Disjointness, namely on each of the t inputs (xi, yi), . . . , {xt,yt)- This 
P' uses at most tc2 qubits of communication. For every x and y, it follows from linearity of expectation that 
the expected number of instances where P' errs, is at most et/2 (expectation taken over Charlie's message, 
and the t-fold Alice-Bob protocol). Hence by Markov's inequality, the probability that P' errs on more than 
et instances, is at most 1/2. Then for every x, y there exists a ci-bit message nixy such that P', when given 
that message to start with, with probability at least 1/2 correctly computes 1 — e of all t instances. 

* Actually, this bound for the case £ — S was already known earlier; see (3). 

'Their conference paper had an il{n^''^ / log n) bound, but the journal version f5\ managed to get rid of the log n. 
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Now replace Charlie's ci-bit message by a uniformly random message m. Alice and Bob can just 
generate this by themselves using shared randomness. This gives a new 2-party protocol P". For each x, y, 
with probability 2~'^^ we have m = ruxy, hence with probability at least ^2^'^^ the protocol P" correctly 
computes 1 — e of all t instances of Disjointness on n/t bits each. Choosing t = 0(ci) and invoking 
Theorem [T3] gives a lower bound on the communication in P": tc2 = ^{t^Jn/t). Hence C2 = ^{\/n/ ci). 
The overall communication of the original 3-party protocol P is 

ci + C2 = ci + Sl(v^n/ci) = Vt{n^/^) 

(the minimizing value is t = v}^^). 

This generalizes the bound of Beame et al. to the case where we allow Alice and Bob to send each 
other qubits. Note that this bound is tight for our restricted set of z's, since Alice and Bob know z and 
can compute the 2-party Disjointness on the relevant {xi,yi) in 0{V n^/S) = 0{n^^^) qubits of two-way 
communication without help from Charlie, using the optimal quantum protocol for 2-party Disjointness [1]. 

A.2 Communication-type C ^ B ^ A 

Now consider an even more restricted type of communication: Charlie sends a classical message to Bob, 
then Bob sends a quantum message to Alice, and Alice computes the output. We can use a similar argument 
as before, dividing the inputs into t = 0(n^/^) equal-sized blocks instead of 0{n^^^) equal-sized blocks. 
If we now replace the two-way SDPT (Theorem O by the new one-way SDPT (Theorem [TOl). we obtain a 
lower bound of ^l.{^/n) for 3-paity bounded-error protocols for Disjointness of this restricted type. 

Remark. If Charlie's message is quantum as well, then the same approach works, except we need to 
reduce the error of the protocol to ^ 1/t at a multiplicative cost of O(logt) = O(logn) to both ci and C2 
(Charlie's one quantum message needs to be reused t times). This worsens the two communication lower 
bounds to f2(n^/^/ log n) and Q{^/n/ log n) qubits, respectively. 

B Massaging locally decodable codes to a special form 

In this appendix we justify the special decoding-format of LDCs claimed after Definition [3] First, it will be 
convenient to switch to the notion of a smooth code, introduced by Katz and Trevisan 1321 . 

Definition 4. C : {±1}" {±1}^ is a {q, c, e)-smooth code if there is a randomized decoding algorithm 
A such that 

1. A makes at most q queries, non-adaptively. 

2. For all x £ {±1}" and i G [n] we have Pr[A'^(^)(i) = Xj] > 1/2 + e. 

3. For all x £ {±1}", i £ [n], and j £ [N], the probability that on input i algorithm A queries index j 
is at most c/N. 

Note that smooth codes only require good decoding on codewords C (x), not on y that are close to C (x). 
Katz and Trevisan ll32l Theorem 1] established the following connection: 

Tlieorem 14 ( [321 ). A {q, 6, e)-LDC is a {q, q/6, e)-smooth code. 
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Proof: Let C be a {q, 5, e)-LDC and A be its g-query decoder. For each i G [n], let Pi{j) be the probabihty 
that on input i, algorithm A queries index j. Let Hi = {j \ pi{j) > q/{6N)}. Then \Hi\ < 6N, because 
A makes no more than q queries. Let B be the decoder that simulates A, except that on input i it does not 
make queries to j £ Hi, but instead acts as if those bits of its oracle are 0. Then B does not query any j with 
probability greater than q/ {6N). Also, B's behavior on input i and oracle C{x) is the same as ^'s behavior 
on input i and the oracle y that is obtained by setting the ffj-indices of C{x) to 0. Since y has distance at 
most \Hi\ < 5N from C{x), we have Pr[B'^(^') (i) = Xi] = Pr[Ay{i) = Xi] > 1/2 + e. ■ 

A converse to Theorem [14] also holds: a {q, c, e)-smooth code is a (g, 6,e — cd)-LDC, because the 
probability that the decoder queries one of SN corrupted positions is at most {c/N){6N) = c6. Hence 
LDCs and smooth codes are essentially equivalent, for appropriate choices of the parameters. 

Theorem 15 (|32|). Suppose C : {±1}'^ {±1}^ is a {q, c, e)-smooth code. Then for every i G [n], there 
exists a set Mi, consisting of at least eN/ {cq) disjoint sets of at most q elements of [N] each, such that for 
every Q S Mi there exists a function fq : {ztljl'^l {il} with the property 

^AfQ{C{x)Q)xi\ > e. 

Here C{x)q is the restriction ofC{x) to the bits in Q, and the expectation is uniform over all x G {±1}". 

Proof: Fix some i G [n]. Without loss of generality we assume that to decode Xi, the decoder picks some 
set Q C [N] (of at most q indices) with probability p{Q), queries those bits, and then outputs a random 
variable (not yet a function) fQ{C{x)Q) G {±1} that depends on the query-answers. Call such a Q "good" 
if 

Fr,[fQ{Cix)Q)=Xi] > l/2 + e/2. 

Equivalently, Q is good if 

EMC{oo)Q)xi] > e. 

Now consider the hypergraph Hi = {V, Ei) with vertex-set V = [N] and edge-set Ei consisting of all good 
sets Q. The probability that the decoder queries some Q G ii^j is p{Ei) := YlgeEi PiQ)- If it queries some 
Q £ Ei then E,x[fQ{C {x)Q)xi] < 1, and if it queries some Q ^ Ei then Kx[fQ{C {x)Q)xi] < e. Since the 
overall probability of outputting Xi is at least 1/2 + e for every x, we have 

2e < E,,Q[/Q(C(x)Q)xi] < piEi) • 1 + (1 - p(E,))e = e + p{E,){l - e), 

hence 

p{E,) > e/{l -e)>e. 
Since C is smooth, for every j G [N] we have 

A matching of Hi is a set of disjoint Q £ Ei. Let Mi be a matching in Hi of maximal size. Our goal is to 
show |Mj| > eN/{cq). Define T = Uq^MiQ- This set T has at most q\Mi\ elements, and intersects each 
Q G Ei (otherwise Mi would not be maximal). We now lower bound the size of Mi as follows: 
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where (*) holds because each Q E £'j is counted exactly once on the left and at least once on the right (since 
T intersects each Q G Ei). Hence |Mj| > eN/{cq). It remains to turn the random variables fQ{C{x)Q) 
into fixed values in {±1}; it is easy to see that this can always be done without reducing the correlation 
^.[fQ{C{x)Q)x,]. m 

The previous theorem establishes that the decoder can just pick a uniformly random element Q G Mi, 
and then continue as the original decoder would on those queries, at the expense of reducing the average 
success probability by a factor 2. In principle, the decoder could output any function of the \Q\ queried bits 
that it wants. We now show (along the lines of |[33l Lemma 2]) that we can restrict attention to parities (or 
their negations), at the expense of decreasing the average success probability by another factor of 2'^. 

Theorem 16. Suppose C : {±1}" — > {±1}^ is a (q, c, e)-smooth code. Then for every i G [n\ there exists 
a set Mi, consisting of at least eN/{cq) disjoint sets of at most q elements of [N] each, such that for every 
Q £ Mi there exists an Oj^g G {il} with the property that 

^x[ai,QXi JJ C{x)j] > ^. 

Proof: Fix i G [n] and take the set Mj produced by Theorem [151 For every Q Mi we have 

IEx[/q(C(x)q)x,] > e. 

We would like to turn the functions fq : {ztljl'^l — > {±1} into parity functions. Consider the Fourier 
transform of fq: for S C [\Q\] and z G {±1}I*3I, define parity function xs{z) = YijeS^i ^^'^ Fourier 
coefficient fq{S) = ^ /o(^)X5(^)- Then we can write 

/q = ^fQiS)xs- 
S 

Using that /q(5) G [-1, 1] for all 5, we have 

e < E,[/q(C(x)q)x,] = Y,fQiS)^x[x^Xs{C{x)q)] <Y,\^AxiXs{C{x)Q)]\ . 

s s 

Since the right-hand side is the sum of 21*^1 terms, there exists an S with |Ea;[xjXs(C(2;)Q)]| ^ "Tnl- 
Defining Oj^g = sign{E^[xiXs{C{x)q)]) G {±1}, we have 

E.KgXi J]C(x),] = \E4xiXs{Cix)q)]\ > > ±. 

The theorem follows by replacing each Q in M, by the set S just obtained from it. ■ 
Combining Theorems [14] and [16] gives the decoding-format claimed after Definition [3] 
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